
********************************************************************************
REPLICATION DATA AND CODE FOR: 
School Discipline Disparities Increase when Neighborhood Black Population Changes 

AUTHORS: 
Jennifer Candipan (Brown University)
Chantal A. Hailey (The University of Texas at Austin)


- Replication Package successfully tested on Dec 12 2025 (Stata v.19.5; reghdfe v.6.x)

********************************************************************************


--------------------------------------------------------------------------------
OVERVIEW AND SCOPE
--------------------------------------------------------------------------------
This replication package contains the data and code necessary to reproduce the 
findings in the Main Text of the manuscript.

The analysis flows from the final analysis dataset (`sa_analysis_final.dta`). 
We also provide the semi-processed intermediate datasets used to construct 
this final file for transparency.

NOTE ON SUPPLEMENTARY INFORMATION (SI):
In accordance with the policy regarding code "central to the findings," this 
package focuses on the core analysis. The Supplementary Information (SI) 
contains robustness checks and descriptive figures that utilize the same 
underlying dataset and similar estimation strategies as the main text. 
Code for these derivative exhibits is not included to prioritize the 
accessibility of the core replication scripts.

NOTE ON TABLE OUTPUT:
The code outputs raw tables (in .txt format) containing the 
coefficients, standard errors, and N. These raw outputs were manually 
formatted in Excel to create the final tables appearing in the manuscript. 
The values match, but the formatting (fonts, lines, layout) will differ.

--------------------------------------------------------------------------------
FILE STRUCTURE
--------------------------------------------------------------------------------
CODE FILES (Located in /code/):
- 01_main.do        : Master script. Defines globals and runs all analysis steps.
- 02_labelvars.do   : Defines value labels and variable descriptions.
- 03_tables.do      : Generates all Tables (Main and Appendix).
- 04_figures.do     : Generates all Figures (Main and Appendix).

DATA FILES (Located in /data/):
1. sa_analysis_final.dta
   - The final harmonized dataset used for all regressions and figures.
   - See "describevars" text file for variable list and descriptions
   
2. crdcraw.dta 
   - Semi-raw CRDC data containing select variables for 2010 and 2018.
   
3. ccd10_18.dta 
   - Semi-cleaned NCES Common Core of Data (CCD) file with select variables 
     for school years 2009-10 and 2017-18.

4. sabvars_wide.dta 
   - SAB-apportioned variables. Contains tract-level measures reapportioned 
     to 2010 School Attendance Boundaries (SABs) via population weights.
   
5. tractvars2010_18.dta 
   - Select Census Tract variables (ACS 2006-10 and 2014-18).
   - Source: Downloaded via Social Explorer.
   - Note: Data uses 2010 tract boundaries. 
   


--------------------------------------------------------------------------------
SOFTWARE REQUIREMENTS
--------------------------------------------------------------------------------
- Stata (must be version 17 or higher)
- Required user-written packages:
  - estout
  - reghdfe (must be version 5 or 6)
  - ftools (must be up to date as of Dec 2025)

--------------------------------------------------------------------------------
INSTRUCTIONS TO REPLICATORS
--------------------------------------------------------------------------------

IMPORTANT: Mac users may need to change any forward slashes (/) to back slashes (\) 
in all global macros that assign paths and subfolders. 

1. Download and unzip the repository to your local machine.
2. Open `01_main.do` in Stata.
3. Edit the `global project` path to match your local folder.
4. Ensure that the final analysis file (sa_analysis_final.dta) is in the "data" subfolder of root folder.
5. Ensure Stata version is 17 or higher. 
6. Ensure that all user-written packages are installed and compatible. 
5. Run `01_main.do`. This will automatically:
   - Set file paths.
   - Run the labeling program (`02_labelvars.do`).
   - Execute the analysis (`03_tables.do` and `04_figures.do`).
   - Save outputs to the `/tables/` and `/figures/` subfolders.

Alternatively, `03_tables.do` and `04_figures.do` can be run independently 
PROVIDED that `01_main.do` is run first to define the global macros.


--------------------------------------------------------------------------------
DATA AVAILABILITY AND SOURCES
--------------------------------------------------------------------------------
This replication package contains the final analysis dataset (`sa_analysis_final.dta`). 
The raw data used to construct this file were drawn from the following public sources.

1. School Discipline Data (Outcome Measures)
   Source: U.S. Department of Education Civil Rights Data Collection (CRDC)
   Years: 2009-10 and 2017-18
   Access: Publicly available at https://ocrdata.ed.gov/
   
   IMPORTANT COVERAGE NOTES:
   - The 2017-18 CRDC is a near-universal collection of all public schools.
   - The 2009-10 CRDC was a sample, not a full census. It included approximately 
     7,000 districts and 72,000 schools (covering about 85% of the nation's 
     students). It heavily oversampled large districts and those with high 
     minority enrollment. Our analysis is restricted to schools present in 
     this sample.

2. School Attendance Boundaries (Neighborhood Definitions)
   Source: School Attendance Boundary Information System (SABINS), Version 1.0
   Year: 2010 boundaries (used for both time periods to maintain consistent geography)
   Access: Available from the Minnesota Population Center at http://www.sabinsdata.org/
   Citation: The College of William and Mary and the Minnesota Population Center (2011).
   
   IMPORTANT COVERAGE NOTES:
   - SABINS does not cover the entire United States. It includes school attendance 
     boundaries (SABs) for approximately 800-900 school districts, covering the 
     majority of the US population in metropolitan areas but omitting many rural 
     districts and districts without defined attendance zones (e.g., full open 
     enrollment).
   - Linking Process: We used the "Grade 10" SABINS-to-school crosswalk provided 
     by SABINS to link boundaries to high schools. Users attempting to replicate 
     this merge must select the 10th-grade specific file.
   
   Note: SABs and schools have a many-to-many match. SABINS also provides 2010 
   census block-to-school boundaries crosswalks as a tabular data file. This 
   information can be used to calculate the population weights assigned to each SAB, 
   facilitating the reapportionment of census tract information to school 
   attendance boundaries. The SABINS microsite also includes related products 
   that may be of interest to researchers, such as point files on 2009-10 school 
   location and archived 2009-10 NCES Common Core of Data products and code. 
   All data are publicly available, but users must first register with the NHGIS 
   site and agree to conditions of use. The site states: 
   “All persons are granted a limited license to use data and documentation from 
   IPUMS NHGIS, subject to the following conditions...” 
   
   See the NHGIS/SABINS site for more information about SABIN-related data products: 
   https://www.nhgis.org/sabins-public-school-data

   NHGIS Citation: 
   Jonathan Schroeder, David Van Riper, Steven Manson, Katherine Knowles, Tracy 
   Kugler, Finn Roberts, and Steven Ruggles. IPUMS National Historical Geographic 
   Information System: Version 20.0 [dataset]. Minneapolis, MN: IPUMS. 2025. 
   http://doi.org/10.18128/D050.V20.0

3. School Characteristics (Controls)
   Source: National Center for Education Statistics (NCES) Common Core of Data (CCD)
   Years: 2009-10 and 2017-18
   Access: Publicly available at https://nces.ed.gov/ccd/pubschuniv.asp

4. Neighborhood Characteristics 
   Source: 5-year American Community Survey estimates via Social Explorer
   Datasets: 
     - American Community Survey (ACS) 5-Year Estimates: 2006-2010 and 2014-2018
   Access: Publicly available via Social Explorer (https://www.socialexplorer.com/) 
   or IPUMS NHGIS (https://www.nhgis.org/).
   
   Note: Social Explorer requires an institutional license to access the data; 
   NHGIS requires users to register with their site. Both sites require users 
   to accept conditions before accessing and using the data. 

5. Neighborhood Population and Geography 
   Source: U.S. Census Bureau via IPUMS NHGIS
   Datasets: 
     - 2010 Decennial Census (block and tract-level population counts for 
       SAB-apportionment and population weights)
     - American Community Survey (ACS) 5-Year Estimates: 2006-2010 and 2014-2018
   Access: Publicly available via IPUMS NHGIS (https://www.nhgis.org/).

6. CBSA Geographic Crosswalk (to identify MSA)
   Source: Missouri Census Data Center (MCDC)
   Dataset: 2010 CBSA to 2010 Census Tract Crosswalk 
   Access: Generated via Geocorr 2018 (Geographic Correspondence Engine) at 
   https://mcdc.missouri.edu/applications/geocorr2018.html
   Note: Publicly available. Can be used to match CBSAs to tracts (or blocks), then SABs.

--------------------------------------------------------------------------------
SAMPLE SELECTION CRITERIA
--------------------------------------------------------------------------------
   The analysis sample was restricted to schools meeting the following criteria:
   - Traditional public high schools serving at least ten 10th-grade students 
     in both 2010 and 2018.
   - Located within a single School Attendance Boundary (SAB).
   - Excluded schools with zero Black students or an all-Black student body 
     in either year (to allow for within-school racial comparisons).


--------------------------------------------------------------------------------
VARIABLE CONSTRUCTION
--------------------------------------------------------------------------------

1. Primary Outcome (Main Models): Black-White OSS Rate Difference (-100 to 100)
   The racial disparity in school discipline is calculated as the difference 
   between the out-of-school suspension (OSS) rates of Black and White students 
   within the same school:
   
   BlackWhiteOSSRateDifference_it (diff_oss_blwh2) = 
   Black_Suspension_Rate_it (oss_rate_bl) - White_Suspension_Rate_it (oss_rate_wh)
   
   Note: This is a school-level measure. Rates represent the percentage of students    
   in each racial group receiving one or more out-of-school suspensions. Schools 
   with zero Black or zero White enrollment in a given year were excluded from 
   suspension disparity calculations.

2. Secondary Outcome: Black-White OSS Suspension Rate Ratio (Sensitivity Analysis)
   Definition: The ratio of the Black student suspension rate to the White 
   student suspension rate within the same school (irr_oss_blwh2).
   Formula: Rate_Ratio_it = (Black_Suspension_Rate_it) / (White_Suspension_Rate_it)
 
   Note: This is a school-level measure. 
   Adjustments for Zero-Values & Outliers:
   - To address undefined ratios where a group's rate was zero, we added a 
     constant (0.001) to all zero-value discipline rates before calculating 
     the ratio.
   - Extreme values were top-coded at 20 to reduce the influence of outliers.


3. Key Predictor: Neighborhood Racial Change Categories
   Neighborhoods (defined as 10th-grade School Attendance Boundaries) were 
   categorized based on the growth rate of their Black population from 2010 to 
   2018 relative to the growth rate of their encompassing Metropolitan 
   Statistical Area (MSA):
   - Increasing Black Population: Neighborhood Black Pop growth rate >= 1.5 * MSA growth rate AND is positive.
   - Decreasing Black Population: Neighborhood Black Pop growth rate < -1.5 * MSA growth rate.
   - Stable Black Population: Between -1.5 and 1.5 * MSA growth rate.


4. Key Contextual Predictors (Stratification Measures)

   A. School/Neighborhood Locale
      - Definition: Categorical variable indicating the urbanicity of the 
        school's location.
      - Source: NCES Common Core of Data (CCD) "Urban-Centric Locale Codes."
      - Construction: We collapsed the standard 12-category NCES locale codes 
        into three aggregate categories:
        1. Urban: City (Locale codes 11, 12, 13)
        2. Suburb/Town: Suburb (Locale codes 21, 22, 23) and Town (31, 32, 33)
        3. Rural: Rural (Locale codes 41, 42, 43)

   B. Initial Neighborhood White Composition (Predominantly White Status)
      - Definition: A binary indicator capturing whether a neighborhood was 
        "predominantly white" at the baseline of the study period.
      - Source: 2010 Decennial Census (tract-level data apportioned to SABs).
      - Construction:
        - Predominantly White: Neighborhoods where the proportion of White 
          residents was greater than 70% in 2010.
        - Non-Predominantly White: Neighborhoods where the proportion of White 
          residents was 70% or less in 2010.
      - Note: Sensitivity analyses in the appendix (Table S3) explore alternative 
        thresholds (e.g., >60% and >50%), but the main analysis utilizes the 
        >70% cutoff.

5. Additional Covariates (Controls)
   - School-Level: Total enrollment, racial composition (% Black, % Latine), 
     socioeconomic status (% Free/Reduced Lunch).
   - Neighborhood-Level: Total population size, poverty rate, educational 
     attainment (% Bachelor’s degree or higher).

6. Robustness Checks 

   A. Alternative Predominantly White Definitions
      - Purpose: To test the sensitivity of our findings to the strictness of 
        the "predominantly white" definition.
      - Source: 2010 Decennial Census (tract-level data apportioned to SABs).
      - Construction: We constructed two additional binary indicators using 
        lower thresholds for the White population proportion in 2010:
        1. > 60% White (Variable: majnhw2): Neighborhoods where the proportion 
           of White residents exceeded 60%.
        2. > 50% White (Variable: majnhw): Neighborhoods where the proportion 
           of White residents exceeded 50%.
      - Usage: These alternative measures are used in the Supplementary 
        Information (e.g., Table S3) to demonstrate that the pattern of results 
        holds across different definitions of neighborhood racial composition.

   B. Alternative Measure of Neighborhood Black Change Type
      For our sensitivity analysis (Appendix Figure S5), neighborhood racial 
      change is measured as the percent change in the share of residents who 
      are Black between 2010 and 2018, calculated as the change in the Black 
      population proportion relative to its 2010 level (sgr_pnhb). This 
      measure is categorized into three groups: neighborhoods experiencing 
      increasing Black share (percent change greater than +10 percent), 
      decreasing Black share (percent change less than −10 percent), and 
      stable Black share (percent change between −10 and +10 percent) 
      (ngh_black_10). The measure captures proportional changes in racial 
      composition over time, reflecting relative racial transition rather 
      than absolute population growth. Although conceptually distinct from 
      growth rates based on raw population counts, this measure captures 
      closely related patterns of neighborhood racial change in practice.


--------------------------------------------------------------------------------
METHODOLOGICAL NOTE ON DATA MERGING
--------------------------------------------------------------------------------
Spatial Interpolation:
   Because Census data (tracts) do not share common boundaries with School 
   Attendance Boundaries (SABs), the analysis dataset was constructed using 
   population-weighted geographic reapportionment. We used tract-level 2010 
   Decennial Census data to calculate the population weights used to crosswalk 
   tract-level ACS data to the SAB level.
